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FACTOR ANALYSIS BY GENERALIZED LEAST SQUARES 
Karl G. J5reskog 
Educational Testing Service 
and 

Arthur S. Goldberger 
University of Wisconsin 

Abstract 

Aitkin 1 s generalized least squares (GLS) principle, with the inverse 
of the observed variance -covariance matrix as a weight matrix, is applied 
to estimate the factor analysis model in the exploratory (unrestricted) 
case. It is shown that the GLS estimates are scale free and asymptotically 
efficient. The estimates are computed by a rapidly converging Newton- 
Raphson procedure. A new technique is used to deal with Heywood cases 



effectively. 



FACTOR ANALYSIS BY GENERALIZED LEAST SQUARES 
Karl G. Jtireskog* 

Educational Testing Service 
and 

Arthur S . Goldberger* 

University of Wisconsin 

1« Introduction 

Consider the factor analysis model, 

(1) y = Af + u , 

where y is a p x 1 vector of observable random variables, A is a 
p x k matrix of unknown factor loadings, f is the k x 1 vector of un- 
observable common factors and u is a p x 1 vector of unobservable unique 

factors or residuals. It is assumed that S(f) = 0 , e(ff ) = I , S(u) = 0 
2 2 

and S(uu' ) = \|r , where \|r is a diagonal matrix. It is further assumed 

u and f are uncorrelated. (For convenience, a mean vector has been sup- 
pressed in (l)). From these assumptions it follows that the variance- 
covariance matrix E of y is 

(2) E = AA' + \|r^ 

The force of the model when k is small relative to p lies in the con- 
straints it imposes on this variance-covariance matrix: the r = \ p(p + l) 

*The first author is Research Statistician at Educational Testing 
Service, Princeton, N.J. The second author is Professor of Economics at 
the University of Wisconsin- Work on this project was in part supported 
by a grant from the Research Committee of the University of Wisconsin 
Graduate School. The authors wish to thank Michael Browne for many 
helpful comments and Marielle van Thillo for valuable assistance in the 
numerical computations. This paper is also being distributed in the 
Workshop Paper Series of the Social Systems Research Institute. 
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distinct elements of E are expressed in terms of the (k + l)p unknown 
parameters in A and ^ • Since A in (l) may be postmultiplied by an 
arbitrary k k orthogonal matrix without changing E , A may be chosen 

to satisfy | k(k - l) independent conditions. Thus, the effective number 
of unknown parameters are s = (k + l)p - § k(k - l) and the degrees of 
freedom of the model is 

( 3 ) d = r - s = | [(p - k ) 2 - (p - k)] 

Let S- denote the p x p sample variance -covariance matrix of y 
with n degrees of freedom obtained in a random sample of size n + 1 . 

The estimation problem of factor analysis is to use S to develop esti- 
mates of A and i|f 2 • The factor analysis literature contains alternative 

2 

estimation procedures^ many of which amount to choosing A and ° 

make Z close to S in some sense [cf. Anderson, 1959, PP* 19” 22 3* 

Let 4>(S,Z) be a scalar measure of the distance between S and Z to be 
minimized with respect to A and 'if . It is convenient to normalize <D 
so that <t> = 0 when S = Z . A desirable property for ♦ is that 

4>(S,Z) = 4> (DSD, DZD) 



for all diagonal matrices D of positive scale factors. Such a 

yield estimates that are scale-free. 

One simple measure <t> is the unweighted sum of squares 



will 



( 4 ) 



U = tr (S - Z )' 
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This measure, which is minimized by the iterated principal factor method 
and the minres method [Harman, 1967, Chapters 8 and S>], i s no ^ scale -free 
and is therefore usually applied to the correlation matrix R instead of 
S . Another measure <t> is the function employed in maximum likelihood 
(ML) factor analysis [see e.g., J5reskog, 1967 ]’. 

(5) F = tr(S _1 S) - log |z _ 1 s| - p 

This measure is scale-free and, when y is multinormally distributed, leads 
to efficient estimates in large samples. 

In this paper, we propose an estimation procedure which calls for 
minimization of the quantity 

(6) G = ~ tr(l - S _1 Z) 2 

This yields a scale -free method and when normality is assumed produces 
estimates which have the same asymptotic properties as the maximum likeli- 
hood estimates. 



O 

ERIC 



2. Generalized Least Squares Principle 

The background for our proposal is as follows. Assuming that y is 
multinormally distributed, S has the Wishart distribution with expecta- 
tion Eq , where E^ is the true population variance -covariance matrix of 
y . Therefore, a straigntforward application of Aitken’s [193^-35] general- 
ized least squares principle would choose parameter estimates to minimize 
the quantity 

(7) G = \ trtZ^CS - Z)] 2 . 
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In practice, of course, Eq is unknown, so that the Aitken procedure 
is not operational. Nevertheless S estimates E^ . Using the estimate; 

S in place of E^ in (7) gives 

(8) G = | tr[S _1 (S - E)] 2 = | tr(l - S _1 E) 2 , 



which is the criterion to be minimized in our modified generalized least 
squares (GLS) procedure. 

There is an interesting connection between the ML criterion (5) and 
the GIS criterion (6). Let a,,..., a denote the characteristic roots of 

j. ’ p 

S "*"E ; they will be positive and, when S is close to E , lie in the 
neighborhood of unity. Since the trace and determinant are respectively 
the sum and product of the roots, we see that 



(9) 




P 

(l/a ) + log H a 
' nr .. m 

rrt=l 



- P 



E (l/a - 1 + log a ) 
, ' ' m nr 

m=l 



The characteristic roots of I - S ^E are 1 - a^, 
those of (I - S "^E) 2 are (l - a,) 2 , ...,(l - a ) 2 

_L p 

(10) G = | E (1 - a m ) 2 . 

m=l 






1 - a , so that 
P 

Consequently 



Expanding l/ a m and log a^ in a Taylor series about the point a = 1 
and discarding terms of order higher than the second gives 



l/a s 1 - (a - 1) + (a - l)‘ 
' m v m y v m J 
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log a 



m 



- (a* - 1) - 5 ( a m - X) 



m 



Thus 

(11) F = | S (a m - l) 2 = G , 

m=l 

so that the ML criterion can be viewed as an approximation to the GLS 
criterion. 

Our proposal derives from Zellner's [1962] operational approach to 
generalized least squares estimation in multivariate regression models with 
linear constraints on the regression coefficients. Malinvaud [19 o6, 

Chapter 9] extends the approach to cover nonlinear constraints on the 
regression coefficients. Rothenberg [1966, p. 38] indicates a further 
extension to cover constraints on the disturbance variance -covariance 
matrix. For factor analysis with known factor loadings, Browne [1976] 
suggests using weighted least squares with S estimating • Ulti- 

mately, all these procedures are applications of the minimum- X principle 
of estimation; cf. Neyman [l9^9l> Taylor [1953], Ferguson [1958]. 

The GLS principle can be used in confirmatory (restricted) factor 
analysis also, but in this paper we shall consider only exploratory 
(unrestricted) factor analysis* 

Reduction of G 

The function G in (6) is now regarded as a function G(A,'lO of A 
and \|r and is to be minimized with respect to these matrices. The mini- 
mization will be done in two steps. We first find the conditional minimum 
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of G for a given \|r and then find the overall minimum. To begin with 
we shall assume that * is nonsingular. The partial derivative of G with 
respect to A is (see Appendix Al) 

(12) dG/dA = 2S _1 (Z - S)S _1 A , 

which, when set equal to zero and preaiui tip lied by S gives 

(13) ZS _1 A = A , 

or 

(14) S _1 A = Z _1 A 

Using 

Z _1 = \tr" 2 - \T 2 A(I + AUlf" 2 A) -1 A'\Jf‘ 2 , 

(14) simplifies to 

S _1 A = \|r" 2 A( I + A' \|r” 2 A) _1 , 

or 

(15) (^fS -1 ^ )4f -1 A = ^ _1 A(I + A' ^~ 2 A) -1 

n 

The matrix A‘\|f~ A may be assumed to be diagonal since it can always 
be reduced to diagonal form by a proper choice of an orthogonal post- 
multiplier to A . The columns of the matrix on the right side of ( 15 ) 
then become proportional to the columns of \|f A . Thus the columns of 
\|r _1 A are characteristic vectors of tyS "S|r and the diagonal elements of 
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(X A'^" 2 A) _1 are the corresponding roots. It will he shown that the 
conditional minimum of G , for the given >|f , is obtained when the 
columns of ¥ -1 A are chosen as vectors corresponding to the k smallest 

characteristic roots of tyS \ • Let 7j_ 1 be th6 charaC 

teristic roots of and let . . -,a> p be an orthonormal set of 

corresponding characteristic vectors. Let P = diag(7^, bl " 
partitioned as r = diagfr^ r g ) where r i = diag( 7l , 7 2 , • • • >7 k ) and 
P ? . diag( 7k+1 ,7 k+2 ,---,7 p ) and let ft = [<0^, . • .,<0 p ] be partitioned 
as ft = [ft ,ft 2 ] where ft consists of the first k vectors and ft g 
of the last p - k vectors. Then 

(16) ft^ft-L = I , = ° > n 2 fi 2 = 1 * 

(IT) + n 2 r 2 n 2 

and the conditional solution A is given by 

(18 ) a = - i) 1 / 2 • 

This conditional solution is identical to that of maximum likelihood factor 
analysis [see e.g., Jttreskog, 1967, eq. 17, where the solution is expressed 

in terms of the roots and vectors of i|r "W 1 1 • 

Defining 

(19) Z = AA' + ty 2 , 

it is easily verified from (16) and (l8) that 

(20) = fi i r i lfi i + n 2 Q 2 
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and that 



( 21 ) 



s _1 i = ♦' 1 [n 2 (x - r 2 ) n P* 



so that 



tr (I - S _ 1 Z ) 2 = tr(l - r 2 ) 2 = T, (7 m " X ) 



m=k+l 

Therefore the conditional minimum of G(A,\|r) , with respect to A for 
a given is the function g()|r) defined by 

1 p / -,\2 

(22) gU) = 2 f , (7 m " 1} • 

m=k+l 

It is now clear what the effect will be of choosing, as columns of 
„-l A , characteristic rectors other than those corresponding to the k 
smallest roots. The roots not chosen would then be involved in (22) and 
the sum of squares would be larger than or equal to that in (22). 

4 . Minimization of g(j) 

We now propose to minimize g(*) numerically by the Hewton-Raphson 
method, making use of first and second derivatives of g • The roots and 
vectors 7 and « , > = 1.8,. ...p , of *(♦) = are functions 

of ♦ • The first and second derivatives of g(t) may be obtained from 
the first derivatives of and <o m . As shown in Appendix A2, the latter 

are 



(25) Sr m /St. = (2/tpvL 



(24) 



(to /3f = (l/t )o» s 

lui d J d m nj^m 'm 'n 
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th. 

where cd. is the i element of (jo 
im m 



By differentiating (22) with respect to we obtain 



bg/bf ± = Z (7 - 1 )(^ Tn /^ i ) 

m=k+l 

which, after substitution from (25 ) ^ becomes 

(2 5 ) hg/bt . (2/t ) | (7® - 7 m )^ m 

in=k+l 



By differentiating (25) with respect to f. , we obtain 

d 2 g/3f a* = (2/t ) | |(2y m - l)%n (; V a V 

0 m=k+l ( 



+ 2 ( 7 ^ - 7 )ao. (Sen. /df.) - (l/f.)&. .(7 - 7 )cd. > 

v/ m 'm' im' im' v ' 1 ij u m m im 



which, after substitution from (23) and (24) and simplification, becomes 



(26) a 2 g/af.», = WM,) s {to. - 

1 J X <J m=lc+1 I 



v 2 2 

7 )(J0 . CJO . 

'm im jm 



+ (y - 7 )oo. CJO. 

V im jm ^ - 



Z 7 m + 7 n 



- 7 

m n 



0). CD. 
in jn 



- Cl/2)6 (7 - 7 )co. CD. a 

' ^ ; ij 'in 'nr im jmj 



In minimising g(x|r) we shall follow a procedure similar to Clarke's 
[1970] method for maximum likelihood factor analysis. Clarke used the 
roots and vectors of and minimized a function of ^ rather 

than \Jr • While this method works satisfactorily in all cases where the 
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solution is proper, having no '$7 very close to zero, certain improve- 
ments can be made to handle Heywood cases (improper solutions) more effec- 
tively. When one or more of the are close to zero both first- and 

second-order derivatives are poorly defined and difficult to compute 
accurately. Jftreskog [1967) describes a procedure to deal with this dif- 
ficulty, which involves (i) fixing $7 at some arbitrary small positive 
value such as 0.001 for subsequent iterations and minimizing with respect 
to the remaining f 1 . and (ii) once a Heywood variable with ^7 = 0.001 
has been found, this variable is partialed out and the minimization 
process repeated with fewer factors on a smaller matrix. Although this is 
quite correct in principle, it is somewhat time consuming. When working 
with the roots and vectors of \|rS "Sir , rather than those of \|r ^S\|r ^ , the 
partial elimination of variables may be completely avoided. Jennrich and 

-1/2 2 -1/2 

Robinson [1969], operating on the roots and vectors of S ' \|r S ' in- 



,-1 



stead of on those of tyS \|r , used a similar procedure which also does not 

break down when is singular. Furthermore, a transformation of variables 

may be made which make the derivatives stable even at f.= 0 . This trans- 

1 

formation from f. to 0. is defined by 
1 1 



(27) 



) = log ^ ; f 1 = + ^e^- 



We now consider g as a function of 0^, 0g, ..*,0 instead of • • •, 

The new function g(o) is defined for all 0^ , < 0 ^ < + 00 • Note 

that = 0 corresponds to 0 = . 
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o 

The derivatives dg/d@^ and are obtained from dg/ctfll 

and 8 l "g/ 8 ^i\ 8^1 by 

ag/asj. - (^Kag/ar) , 

i> z s/i>e,i>e = (%*./k)(d 2 g/b*M.) + » M (*,/4)(8g/a*J . 



These derivatives therefore become 



(28) dg/^e. „ 2 ( y 2 _ y )ax 

1 m=k+l m m lm 



( 29 ) 



d 2 g/d0.d0. = Z (2 7 ^ - 7 )a ) 2 0) 2 

1 0 m m lm 



P 2 

+ Z ( 7 - 7 )to. a). Z 

,, , , 'm 'm lm jm / 
m=k+l ° njtn 



7 +7 

m ' n 



P 



2 2 



Z (27 - 7 )a) 7 a). 



m=k+l 



P 



m n' lm jm 



+ Z ( 7-7 )to. to. Z 
, , , m m xm jm n 
m=k+l 0 n=] 



P 2 

+ s ( 7-7 Jto.jXn Z 

m=k+l m 





0) 


7 - 


7„ 


m 


n 


7 + 




m 


n 




■ Q) 


7 W " 


7„ - 


m 


n 


7 


+ 7„ 


m 


n 



0 ). 0 ). 



to. to. 



a), to. 



m lm jm , , n 7 - 7 in jn 

0 n=k+l / m ' n ° 



n^i 



:m 
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The last term may be written 



p m-1 

£ 2 

m=k+l n=k+l 



r (7? 

' m 



- 7 )(7 + 7 ) (7 

/ Tn /W m / n / + n 



7 - 7 

m ' n 



’’n - ’'m 



cd. (jo. (jo. (jo. 
1m jm in jn 



p m-1 

£ £ (7 + 7 )(y +y - lW (JO. (jo. (JO. 

m=k+l n=k+l m n m n lm in 



\ ^ ^ (7 + 7 n )(7 m + 7 n 

m=k+l n=k+l m n 



lk (jo. (jo. cjo. 
lm jm in jn 



n^ 



m 



which after substitution into (29), simplification and use of the relation 



gives 



P k 

£ (D. (jo. = &. . - £ (jo. (jo. , 

. lm jm 11 n in jn 
m=k+l 0 0 n=l 



( 30 ) a g/be be = ( £ 7 (n (n ) 2 + 5 (Sg/30 ) 

1 ^ m=k+l ^ J J 



P p k 7 n 

2 £ (7 -7 )cjo. (JO. £ (jo. (jo. 

. m m 1m im . 7 - 7 in jn 

m=k+l 0 n=l ' m n 



When 7^+1* 7 k+2* * * *>7 are a H close to one, this is approximately 



( 31 ) a g/de.de. ~ ( £ (jo. cd . ) 

' 1 j . im: jm 

0 m=k+l 0 



O 

ERLC 



It should be noted that the function and all derivatives of first and second 
order may be computed accurately everywhere, even at 0 _^ = -00 ( f ^ = Q ). 
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r; 

If 

ft- 
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Let 0 denote a column vector with elements ’’**®p an< ^ 

let h and H denote the column vector and matrix of corresponding 

2 (s ) 

derivatives dg/ci 0 and ci g/c^ci© 1 , respectively. Let 0' denote 

the value of 0 in the iteration and let h^ S ^ and be the 

corresponding vector and matrix of first- and second-order derivatives. 

The iteration procedure may then be written 



(32) H (S) 5 (S) = h< s) 



(33) 



0 (s+l) o 0 (s) _ 8 (s) 



where is a column vector of corrections determined by (32). The 

Newton-Raphson procedure is therefore easy to apply, the main computations 
in each iteration being the computation of the roots and vectors of 
tS v and the solution of the symmetric system (32). It has been found 
that the Newton-Raphson procedure is very efficient, generally requiring 
only a few iterations for convergence. The convergence criterion is that 
the largest absolute correction be less than a prescribed small number 
e . The minimizing 0 may be determined very accurately, if desired, 
by choosing e very small. 

In detail the numerical method is as follows . The starting point 
0^^ is chosen as [see e.g., Jtireskog, 1963* eqs . 6.20 and 7*10 or 
Jiireskog, 1967, eq. 26], 



(34) 



0^ = log[ (l - k/^Xl/s 11 )] 



where s ii is the i th diagonal element of S 



The exact matrix H 
of second order derivatives given by (30 ) may not be positive definite in 
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the beginning. Therefore, the approximation E given by (31), which is 
always Gramian, is used in the first iteration and for as long as the maxi- 
mum absolute correction is greater than 0.1. After that, H is used if it 
is positive definite. It has been found empirically that E gives good 
reductions in function values in the early iterations but is comparatively 
ineffective near the minimum, whereas H near the minimum is very effective* 
To compute the characteristic roots and vectors of \|rS \|r in each 
iteration, we use the Householder transformation to tridiagonal form, the 
QR method for the roots of the tridiagonal matrix and inverse iteration 
for the vectors. This is probably the most efficient method available 
[see Wilkinson, 1965 ] * The system of equations (32) are solved by the 
square root factorization H = TT' , where T is lower triangular. 

This shows at an early stage whether H is positive definite or not. 

In Heywood cases, when one or more of the 0. -00 , i.e., 0 , 

a slight modification of the Newton -Raphson procedure is necessary to 
achieve fast convergence. This is due to the fact that the search for 
the minimum is then along a "valley" and not in a quadratic region. 

When 0. -»-«>, bg/bd . -» 0 and d g/bO.bQ. 0 , j = 1 , 2, ...,p , 
so that when 0 ^ is small the element of h and the i row and 
column of H and E are also small* This tends to produce a "bad" 
correction vector B and the function may increase instead of decrease. 

A simple and effective way to deal with this problem is to delete the i th 
equation in the system ( 32 ) and compute the corrections for all the other 
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0's from the reduced system. One then computes the correction for 9^ 
as 

(35) = (3g/Se i )/(a 2 g /Se^) 

This procedure will decrease 0^ slowly in the beginning but 
faster the more evident it is that 0. is a Heywood variable. When 0. 

l l 

has become less than -10 it is not necessary to change 0^ any more unless 
is negative. Thus, the procedure corrects itself quickly if a 
variable is incorrectly taken as a Heywood variable. 



5> Asymptotic Distribution Theory 



In this section we show that the GLS estimates and the ML estimates 
have the same asymptotic properties. In particular we shall evaluate the 
common asymptotic variance -covariance matrix of the estimates of 



It is assumed that S converges stochastically to Z of the form 
(2), and that the elements of \/n (S - Z) have an asymptotic multi- 
nomial distribution with variances and covariances given by 



m 0=0*3 - V ( V - v )] = Wev + °avV ’ 

which are the elements of 2(Z(x)Z) . In particular, this is true when the 
observations on y are drawn from a multinormal distribution with variance - 
covariance matrix Z . The matrices Z , A and \Jr now denote the true 
population values as distinguished from the mathematical variables A and 
\|r used in the previous sections. It is furthermore assumed that ^ ^ 0 , 
i 1,2, ...,p , i.e., that the population is not a Heywood case. 



-l6- 

Let A = \|rZ _1 \|/ and let y 1 < J 2 < • • • 1 7 p be the characteristic 
roots of A with o^, a> 2 , .. . , a> an orthonormal set of corresponding 
characteristic vectors. Let r = diag(y ,7g, . • *,7 k > > r 2 ” dlag( Vl’ 

V2’"'’ r p) ' n i ” and “r W ?'"'" 1 ■ We 

assume that the roots in are all distinct. Then 

A = + ^ 2 r 2 fi 2 

a' 1 = + fi 2 r 2 ln 2 • 

since 

A _1 = \|r“ 1 Z\|r'' 1 = \lr _1 AA r \|r -1 + I , 
we have that 7 k+1 = ? k+2 = ... =7=1, or 

(40) r 2 = I 

Defining 

(41) ^ - ^2^2 , 

it follows from (37) , (38), (40) , = 0 and = I that H has 

the properties 

(42) AH = A -1 H = HA = HA -1 = H 2 = H 

Corresponding to the population quantities in (37) and ( 38 ) we have the 



(37) 

and 

(38) 

However, 

(39) 



corresponding sample quantities 
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(^3) 

and 



C* ^ ^ , AAA. 

A = ^ 1 r 1 ^ 1 + 



(44) 


^ — "1 /S /N — *1/N /S 

A = + n 2 r 2 ^2 








where 


\ = diagC^,^, • • - ,7 k ) and 


/\ 

r 2 




are 


diagonal matrices of the characteristic 

A* /N — *1 /% /N 


roots y 1 < 7 2 < ... < 7 p 


of 


A = \|rS 


\|/ and of order p x k 


and 


^ of order px(p - k) 


are 



matrices of corresponding orthonormal characteristic vectors. These are 
the quantities obtained at the minimum of g(i|r) • 

We shall show that $ converges stochastically to . The function 
g(\|f) in (22) is also a function of S and will now be denoted g(S,\|/*) . 

/N 

The estimate \|r is defined as the value of that minimizes g(S,\|r*) 

for a given S . But g(S,\|/*) converges stochastically to g(Z, \|/*) which 

has a unique minimum at = i|r . Since the functions are continuous, $ 

must converge stochastically to \|f . 

In deriving various asymptotic results we shall make repeated use 

of the following well-known lemma [see e.g., Wilks, 1962, p. 103): If 

C = (c. .) is a matrix whose elements are continuous functions of random 

variables x, .x,-, ....x and if plim x, = 6, exists and is finite for 
1* 2’ ’ m k k 

all k , then plim C(x) = C(|) • 

From this it follows immediately that 



(45) plim A = plim \jfS = i|rZ i|r = A , 

and that plim 7 =7 / plim co = co- • 

' m ' m m m 
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Hence, from (30 ) and (40) we have that 



plim = ( | > 

° m=k+l 



and from (26) that [cf. Anderson & Rubin, 1956, eq. 12.24; Lawley, 1967* 
eq. 7 and jBreskog, 1967, eq. 101] 
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(46) plim b 2 g/bf bf = (k/f t )( l co co f . 

J d m=k+l d 

The asymptotic variance -covariance matrix of the ML estimates of the f's 
is given by (2/n)E , where E is the matrix whose ij element is 

given by the right-hand side of (46). We proceed to show that (2/n)E ^ 

is also the asymptotic variance covariance matrix of the GLS estimates of 

the f's. 

The GLS estimates . . . , ^ are defined implicitly by the 

following equations 

dg/df ± = 0 , i = 1,2, ...,p , 

which by (25) may be written 

(Vj) diag[n 2 (f^ - r 2 )^«] = 0 . 

We shall write (47) linearly in statistical differentials. The symbol 6 

is used to denote deviations of sample from population values. All such 

-l/2 

deviations are of order n ' in probability and since we assume that 
n is large, we shall neglect in what follows terms of second and 
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higher degrees in the 6's. Let - I , Sfig = Sig - fig 

and 6A = A - A . Then we have to the order of approximation indicated 
r\ = i + 26r 2 , ?g - r 2 = &r 2 and fi 2 (f^ - r 2 )fi£ = fig&r^ * But 
6r 2 = fi 2 &Afi 2 which may be verified from Afi 2 = fig^ and figfig = I • 
Hence, ( 47 ) is asymptotically equivalent to 

(48) diag(EBAE) = 0 

Furthermore, with £4 
order of approximation 

S _1 o (S + &Z ;) _1 

and 

&A = (t + &^)(L _1 - E~ 1 8EEf 1 )0|r + 6 t|r) - A 
= b^z~\ + 

= 6\|4 _1 A + A\|i _1 6\|i - A\|T 1 &Zt~ 1 A , 

which after substitution into (48) and use of (42) shows that (48) is 
asymptotically equivalent to 

( 49 ) 2diag(E ) = diag(E i|r B E) 

From (37) it follows that the elements of T = \|r B have a 

limiting multinormal distribution with variances and covariances given by 



s y - i|r and 6Z = S - Z we have to the same 



= e" 1 - Z _1 SEZ 
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(50) nfiCt^v) = aCX[± ^ V + a aV a fti , 

where a^' denotes the ij ^ element of A . 

Equation (49) is linear in Sf^5f^...^5f^ and may be written in 
scalar form as 



P 



P P 



(53.) 2 Z «<_(*_/*_) = 2 z Eiafjfitce 

x=l lx x x 0=1 3=1 ia 1(3 at5 



i — 1, 2, > • • , p 



which may be solved for Sf /t if the matrix $ with elements 

6 6 

2 P 2 

.. = !.. = ( Z 0 ). 0). ) is nonsingular. The solution is 



ag ag 



im 



,=k+l lmjm 



(52) 



n P P P 

Sf /f = ~ £ 2 S 

g g 2 i=l 0=1 3=1 



sSl 



^io^ip^ap 



g = 1,2, 



.P 



Equation (52) shows that 8$^, Btf’g, . . . , 6^ are asymptotically linear 
in the elements of T and hence will have a limiting multinormal dis- 
tribution. To obtain the asymptotic variance-covariance matrix of 

A A A 

% , t^ } • • • , we write equation ( 52 ) with indices h , j , p and v 
instead of g , i , a and (3 respectively, multiply these equations 
and use (50) and (42). This gives 



(n/t t)S(&t St) = (1/4) ZE2EZE 4> gl <!> h ' 3 g ._,& . A . |. (a°^a Pv 

w g h 7 v g h 7 ijapiav b io: b i3 b jq b jv v 



= (l/2) Z Z 4> gl 4> h ^ 2 . 
i 3 



= (1/2) Z Z 4> gl <t> . .4>^ h 

*1 A 



+ a 



otv pp 



) 




= (l/2)4> gh 
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Hence, 

(53) G(df g Qf h ) = (t g t h /2n)<i> gh , 

which is the gh element of (2/n)E x . This, therefore, shows that the 
asymptotic variance -covariance matrix (2/n)E ^ is the same for both the 
Ml estimates and the GLS estimates. 

Lawley [1967] obtained the unconditional asymptotic distribution of 

A ~ 

the ML estimate A from the conditional asymptotic distribution of A 
for given [Lawley, 1953 j* Since the conditional estimate A is the 

A 

same for both ML and GLS, it follows that also the GLS estimate A has 
the same asymptotic distribution. 

Another well-known result for the ML method is that n times the 

2 

minimum value F . of F in (5) is asymptotically distributed as X 
1 2 

with d = g ( (P “ k) - (p + k)] degrees of freedom. The same statement 
is true also for the GLS method. To prove this we show that both minima 
are asymptotically equivalent. 

Let (jr denote the maximum likelihood estimates of \|r • Since \|r is 
asymptotically equivalent to , the. characteristic roots 7-^,7 2- , ’"’>^m 
\jrS ^jr are asymptotically equivalent to the corresponding roots ?l'?2'---'?p 
of $S . The minimum of F is [see e.g., JiJreskog, 1967* e Q.* l8] 

P 

E (log 7 + 1/7 - l) 

, , , ' ' m ' ' m 



which is asymptotically equivalent to 
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P 

L 




P 

2 



[io g (i + &7 m ) + nV - ^ 

' m 



m=k+l 



m=k+l 



m=k+l 






2 



6. Results and Comparisons on Numerical Data 



The algorithm described in section b has been implemented in a FORTRAN 
program and run on several matrices. It is interesting to compare the 
results of GLS and ML on the same two correlation matrices, Data 1 and 
Data 2 , as Jttreskog [1967] and Clarke [1970] analyzed with the ML method. 
The correlation matrices are given in both of these papers. 

Data 1 is a correlation matrix of order 9 x 9 an< ^ i- s analyzed with 
three factors. The course of the minimization is shown in Table 1 . It is 
seen that the convergence is quite rapid and that the solution can be 
determined very accurately, to about five decimals in the 0's. This cor- 
responds to an accuracy of about seven decimals in ^ . The solution is 
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given in Table 2 along with the ML solution. It is seen that the two 

solutions are very close, so close that interpretations of the data will 

2 

be the same. The value of X with 12 degrees of freedom and based on 
n = 210 , is 6.98 with GLS and 7*35 with ML. These are very close in 
this case when the fit is very good. 

Data 2 is a correlation matrix of order 10 x 10 and is analyzed with 
four factors. The maximum likelihood solution for this data is a.Heywood 
case with = 0 for variable 8. The behavior under the GLS minimization 
is shown in Table 3 . In this case it takes nine iterations to achieve con- 
vergence • This is because 0g goes very slowly to -10 and reaches -10 at 
iteration 5* After that, convergence is quadratic. The GLS and ML 

solutions are given in Table 4. Also in this case the two solutions are 

very close. The corresponding X values, 19*40 with GLS and 18.45 with 
ML based on 11 degrees of freedom and n = 809, are somewhat more apart, 
deopite the fact that n is large. However, the fit of the factor model 
is not as good as in the Data 1 example. 

It should be noted that for the GLS estimates it does not hold that 

.N/S 

\|r = diag(S - AA* ) which holds for ML estimates- In the examples, com- 

munalities and uniquenesses do not add up to unity. Also it can be seen 

2 

in both Table 2 and Table k that the GLS estimates of are generally 

smaller than the ML estimates- This suggests that the GLS estimates may 

be systematically biased. 
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TABLE 1 

Details of the GLS Minimization for Data 1 



Iteration 


Type 


Function 


Max. correction 


Max- gradient 


0 


.. 


0 . 1170246 


_ _ 


2.45 x 10 _1 


1 


E 


0.04017278 


6.64 x 10 _1 


5.04 x 10 -2 


2 


E 


0. 03541929 


1.71 x 10 _1 


1.06 x 10 -2 


3 


E 


0.05321625 


2.73 X 10 -2 


7*36 x 10 ^ 

s' 


4 


H 


0.03321503 


2.91 X 10"^ 


3-64 x 10 " D 


5 


H 


0.03321503 


2.67 x 10 -5 


2.37 x 10' 10 









TABLE : 
Solutions for 


p 

Data 1 








i 




GLS 






ML 




A., 

ll 


>* 

H- 

ro 


A.. 

i3 


op- 

i 


-K., 

ll 




A., 

i3 


i 


T 

a. 


.662 


.325 


-.082 


.445 


.664 


•321 


-.073 


.450 


2 


.688 


.255 


.191 


• 4l6 


.689 


.247 


• 193 


.427 


3 


.491 


.310 


.225 


.600 


•493 


• 302 


.222 


.617 


4 


.839 


-.286 


•04l 


.208 


• 837 


-.292 


•035 


.212 


5 


.708 


-.309 


.162 


•370 


.705 


-•315 


• 153 


.381 


6 


.823 


-.376 


-.106 


.168 


.819 


-•377 


-.105 


.177 


7 


.660 


• 4o4 


.073 


• 387 


.662 


.396 


.078 


.400 


8 


.454 


.290 


-.484 


• 473 


.458 


.296 


-.491 


.462 


9 


.763 


.434 


.001 


.227 


.766 


.427 


.012 


• 231 
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TABIE 5 

Details of the GLS Minimization for Data 2 



Iteration 


Type 


Function 


Max. correction 


Max. gradient 


0 


-- 


0.08765774 


9.31 x 10 _1 


9.89 x 10 -2 


1 


E 


0.05525694 


4.09 x 10 _1 


2.23 x 10 _2 


2 


E 


0.02892803 


4.65 x 10" 1 


2.17 x 10' 2 


3 


E 


0.02574564 


8.74 x 10 _1 


1.53 x 10 ' 2 


4 


E 


0.02418829 


3.63 x 10 


7.7I x 10~ 5 


5 


E 


0.02401558 


1.97 x 10 4 


1.10 x 10 -5 


6 


H 


0.02398666 


1.00 x 10° 


3.55 x 10" 4 


7 


H 


0.02398479 


1.37 x 10 -2 


2.99 x 10-5 


8 


H 


0.02398478 


1.49 x 10" 5 


4.8l x 10-7 


9 


H 


0.02398478 


1.52 x 10 -5 


4.8l x 10 ^ 
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TABLE 4 

Solutions for Data 2 



i 






GLS 










ML 






A.. 

xl 


*18 


A., 

x3 


\k 




A... 

xl 


*18 


A., 

x3 


\k 


i 


i 


-.188 


-.756 


.034 


-.100 


• 377 


-.188 


• 753 


-.035 


-.108 


.385 


2 


-.120 


-.468 


• 095 


.382 


.604 


-.120 


. 468 


-.103 


.365 


.623 


5 


-.186 


-.763 


•157 


.221 


.309 


-.186 


• 767 


-.167 


.217 


.301 


4 


-.175 


-.527 


.198 


• 135 


.629 


-.173 


.526 


-.200 


.124 


.638 


5 


-.129 


-.678 


.258 


-.345 


• 336 


-.129 


.672 


-.251 


-.349 


.347 


6 


•359 


.259 


•157 


-.047 


.767 


• 359 


-.259 


-.154 


-.048 


.778 


7 


.448 


.501 


.504 


•059 


.289 


.448 


-.504 


-.507 


.052 


.286 


8 


1.000 


-.000 


-.000 


.000 


.000 


1.000 


.000 


.000 


.000 


.000 


9 


.429 


.282 


.212 


-.051 


.680 


.429 


-.282 


-.209 


-.053 


.690 


10 


.516 


.232 


.505 


-.020 


.580 


.316 


-.232 


-.496 


-.029 


.600 
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A. Appendix 

Al» Matrix Derivative of Function G(A, \|f) 

To obtain the matrix derivatives we use matrix differentials, 
general, dX = (dx. .) will denote a matrix of differentials. If ■ 
a function of X and dF = tr(CdX') then ^f/SX = C . Since 
d tr (A) = tr(dA) , we have for a fixed \|r , with G defined by (6) 

and X by (2), 

dG = | d tr(S _1 X - if 
= | tr[d(S _1 S - I) 2 ] 

= tr[(S -1 X - l)d(S _1 £ - I)] 

= tr[(S _1 i; - l)S -1 dX] 

= tr[(S -1 X - l)s" 1 (AdA' + dAA* )] 

= 2tr(S -1 X - l)S _1 AdA' 

= 2tr[S -1 (X - S)S _1 AdA' ] 

Hence, the derivative dG/SA. is that given by (13). 

A2. Matrix Derivatives of Characteristic Roots and Vectors 

The characteristic roots y and vectors a) , m = 1,2,...,] 

m m 

are defined by 



In 

' is 



, of A 
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(Al) 

(A2) 

(A5) 



Ao = 70 
m mm 



o'o = 1 
m m 



co'o) = 0 
m n 



m = 1,2, 



,P 






m 



Differentiation of these equations gives 



(Alt) 

(A5) 



dAo) + Ado 
m m 



a)’ dco =0 
m m 



dy 0) 
m m 



(A 6 ) o’do + do’o = 0 

' ' m n m n 



Premultiplication of (A4) by 



+ 7 do) 
m m 



, n -f- m 

0 )’ and use of (Al) and (A5) gives 
m 



(AT) 




0 )’ dAo) 
m m 



Let e = o' dAo = e for m,n = 1,2, ...,p . Then premultiplication of 
mnmnnm ’ ’ ’ ’ 

(A4) by 0 )^ for n £ m and use of (Al) and (A3) gives 



e = 7 0 )’ do) - o’Ado 
mn m n m n m 



7 0 )* do 
m n m 




- 7 o’ do 
n n m 



)o’ do 
n m 



or 

(A 8 ) 



0) 1 CkD 
n m 



G 

mn 




n £ m 
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Multiplying this equation by 0)^ , summing over n £ m ^ using (A 5 ) and 



remembering that 



E O) O) 1 = I - O) O) 

/ n n mm 

n^m 



gives as 



(A 9 ) doo = E — — 60 

' m / y - 7 n 






m ' m n 



The merits of (A7) and (A9) are that they express the differentials of 
y and cd in terms of the differentials of A . 

• Tvi -m 



m 



m 



In our problem we have A = \|rS "K|r as a function of \|r so that 



dA = 



-1 -1 

= d^ A + A\|r d\|/ 



Substitution of this into the definition of gives 



€ = 
mn 



O) 1 dAoo 
m n 



-1 -1 

= a>* cL\lnlr Aa> + a) T A\|/ d\|ra> 
m y n m n 



= (y +7 )a>* dMnlr 
v/ m 'n' m n 



= (y + 7 ) tr (a) a) 1 \|/ 1 d\(r ) 
v/ m 'n ' n m 



With this result we have 



(A 10 ) d7 = 27 tr (co O) 1 ^ "W) 
v J 'm m N m m / 



O 
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and 



(ah) to - s t 5 -^ ' 

n^m 'm n 

Hsnce the derivatives of 7 m and ® lm ” ith respect to 
(A12) - (V'dKm 



and 



(ai 3 ) ■ (l/ *3 > % 4 m 5^ 



which are the results used in section 4. 




are 

3 
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